How Long Is Long-Term Data Storage?
نویسنده
چکیده
In the context of archiving of physical documents, long-term storage has long been accepted to mean centuries. Digital documents are much more ephemeral, so archivists should be aware of the inherent limitations of the technologies available for preservation of digital data. This paper compiles the results of several studies on this subject, in addition to presenting new findings on what can be expected for recordable optical discs (CDs and DVDs). The bottom line is that, with one notable exception, digital data cannot be expected to endure using any existing technologies. Introduction Less than a decade ago, this author purchased his first digital camera. Only a couple of years later, he had hundreds of personally valuable photos, all stored on his laptop’s magnetic hard-disk drive (HDD). It was then that he asked the question of how to best preserve these pictures for future generations. Being intimately familiar with all the types of storage available, he was immediately distraught to realize that there did not exist a way to preserve these pictures for at least 100 years, preferably much longer. It was also then when he began research into this problem. Determining how long something will last has long been a very important area of study for science and technology, particularly in materials and coatings. Many advances have been made, and much is known today about how to reliably predict the life expectancy (LE) of a product, based on the materials used to make it and the conditions of its use. These advances are readily applied to the field of data storage. Causes of Failure The most common failure mechanisms for materials (excluding mechanical wear) include oxidation, corrosion, and breaking of chemical bonds. Each of these failure mechanisms is exacerbated by elevated temperature, humidity, and exposure to light. That is the reason that any controlled environment that is intended for archival storage always includes controlled temperature, humidity and light. These same failure mechanisms come into play when we consider how to store digital data. There are three basic technologies available for storing digital data: magnetic (including magnetic tape and hard-disk drives), solid-state (consisting primarily of flash memory), and optical (including CDs, DVDs, and Blu-ray discs (BDs)). Each of these technologies uses wellknown processes and materials to manufacture the storage media, and each of these technologies has known failure mechanisms, which have been studied. Predicting Life Expectancies – Magnetic Tape In his paper, “Predicting the Life Expectancy of Modern Tape and Optical Media” [1], author, Vivek Navale looked at multiple studies on LE for magnetic tape [2, 3], which included IBM 3480 and 3590 data cartridges, DLT IV cartridges, SuperDLT cartridges, 8mm data cassettes, D1 and D2 digital video cassettes, and standard VHS videocassettes. According to one of these studies, “Every tape type showed a loss in magnetization when they were under induces stress conditions of higher temperature and relative humidity.” [1] Based on their results, they reported LEs ranging from 10 – 200 years (depending on the type of tape used) if stored at 30°C; this range decreased dramatically to 0.7 – 7 years if stored at 60°C. Navale also showed results from testing these same magnetic tapes at 50°C, with various levels of relative humidity (RH). At RH = 20%, the range was from about 0.6 – 2.8 years; at RH = 80%, this decreased dramatically to a range of about 0.3 – 0.9 years. As previously stated, these LE calculations were based on the measured decrease in magnetization. Another commonly used measure for LE calculations is the errors before correction, reported on magnetic tape as the Block Error Rate, or BLER. The studies showed a clear increase in the digital errors while these tapes were stored at 40°C/50% RH. This increase clearly means that these tapes would eventually fail to read the data back correctly. The calculated LEs ranged from 9.3 years to 1083 years – a HUGE range, but Navale pointed out that this prediction is based on the ability of the error-correction coding (ECC) to correct the errors, and is therefore somewhat subjective. Predicting Life Expectancies – Hard-Disk Drives Hard-disk drives (HDDs) store the majority of the digital data in the world today – an estimated 1 Zettabyte (about 10 bytes) [4], an amount that is beyond our ability to comprehend. Unfortunately, most computer users are all too familiar with the fact that HDDs have a nasty tendency to fail, and to do so catastrophically. But what can be said for their LE? The best way to predict the LE would be to monitor many HDDs for a long time, long enough to see their characteristic failure statistics. Just such a study was done a few years ago. In their 2007 study, Pinheiro et al. [5] reported on a very large population of HDDs in service at Google, Inc. – over 100,000 of them. They gathered data on environmental factors (such as temperature), as well as the many parameters that were reported through self-monitoring and analysis software they deployed on their entire system. Figure 1 shows the Annualized Failure Rates (AFR) for these HDDs. Archiving 2011 Final Program and Proceedings 29 Figure 1: Annualized failure rates broken down by age groups. Figure 2: Utilization AFR for the Pinheiro et al. study. The data reported in their study is insufficient to project an average expected LE for their population of HDDs, but it is obvious from Figure 1 that there is a fairly wide distribution, and that some drives (about 2.5%) fail as early as 3 months. This clearly eliminates HDDs as an archival storage option. Another interesting finding in the Pinheiro et al. study is that they found no correlation between utilization and AFR. This is counterintuitive, but Figure 2 shows the AFR as a function of utilization. They categorized utilization into three levels: low corresponds to the lowest 25 percentile; medium corresponds to the 50-75 percentile, and high corresponds to the top 75 percentile. It can be seen in Figure 2 that there is no correlation between utilization and AFR, which means that simply making sure that a HDD is rarely used is not guaranteed to insure it will last longer. Predicting Life Expectancies – Flash Memory In the very early days of non-volatile solid-state memory, EEPROM made a big splash by being byte-level erasable. This was approximately 1984. In those early days, it was fairly well understood that due to the fact that the data was stored as the charge on a very small, somewhat leaky capacitor (the floating gate), the Mean Time to Data Loss (MTTDL) for EEPROM was in the range of 10-12 years. Since those early years, many changes have been made, and Flash has become the dominant form of EEPROM. Densities have risen dramatically, from the early 256kb capacities, to today’s 8GB capacities, all in a single chip. But even in today’s Flash memory, the MTTDL has not changed that much, due to the intrinsic way in which data is stored. For example, in their 2008 article, Kaneko et al. [6] report that the MTTDL for a Flash SSD (solid-state drive) is approximately 13 years. While the MTTF (Mean Time to Failure) of the actual devices themselves is much longer than the MTTDL (over 100 years), the issue here is that, without active management, the data on SSDs literally evaporates with time, and that evaporation time is well understood. The same is true for Flash memory sticks, also known as jump drives, USB drives, or USB sticks – they all store data for only about 10-12 years, since they all use the same basic floating-gate architecture for storing each bit. Predicting Life Expectancies – Stamped Optical Discs Data on stamped optical discs, whether CDs, DVDs, or BDs, is recorded at the time the discs are manufactured, and cannot be altered by the optical disc drives. This type of optical disc is commonly referred to as ROM (Read-Only Memory), since it cannot be recorded by the user. Figure 3 give an example of the data structures on a CD-ROM, as seen with a scanning electron microscope (SEM). They are nearly impervious to change, except by extreme conditions. In his 2005 paper, Navale [1] reported the LE of CD-ROMs to range from 20 to 12,000 years, with a mean LE of 1592 years. Figure 3: SEM image of the reflective layer bumps on a commercial CD. 30 ©2011 Society for Imaging Science and Technology Figure 4: SEM image of an unrecorded CD-R. While the mean is outstanding, the distribution ranging as low as 20 years is very problematic. Clearly, for archival purposes, research needs to be performed to determine the causes of the early failures. If these causes can be addressed, and the lower end of this distribution fixed, this format of digital data storage could easily be the longest lasting of all current options. As appealing as it is for a storage medium to have an LE of over 1,500 years, it is simply not practical for most users. The reason is that the recording process is the manufacturing process, which means it is very costly for the equipment, and completely impractical for low volumes. Predicting Life Expectancies – Recordable Optical Discs There have been several publications that have addressed this area, and with good reason. Recordable optical discs, and the drives to read and record them, are widely available, inexpensive, easily transported, and almost ubiquitous. Billions of them are sold every year, in all three densities (CD, DVD and BD). With that many advantages, they are a strong candidate for archival storage, but only if the LE of the data is sufficiently long. Recordable optical discs use a very different data storage mechanism than stamped optical discs. Figures 4 and 5 show this very effectively as they use the same instrument to image an unrecorded and a recorded CD-R disc. It is obvious that the recording process has physically altered the tracks, but it is also obvious that the data is not to be found in these physical alterations, for there are no discernible patterns. Optical discs use a layer structure that is very different from ROM discs; this structure is shown in Figure 6. The dye is the optically active component of this structure. The dye is normally a poor reflector of light, as seen in the unrecorded portion of Figure 7. When it is illuminated by the right wavelength of laser light, the dye becomes much more reflective, as seen in the recorded portion of Figure 7. Figure 5: SEM image of a recorded CD-R Figure 6: Layer structure of a CD-R optical disc. (from Worthington) Figure 7: Unrecorded (right side) and recorded portions of a CD-R as seen through a confocal microscope. Archiving 2011 Final Program and Proceedings 31 Figure 8: PI8 Max average by manufacturer including dead discs. The dye used is necessarily very sensitive to light, as it must respond to the laser light in only a few nanoseconds. While this is great for making practical recordable optical discs, it has some serious archival handling implications. If recordable discs are not stored in dark conditions, this dye will degrade, and the recorded data will begin to fade. This degradation mechanism is commonly referred to as dye fading, and is well known in the optical storage industry. Research at the National Institute of Standards and Technology [8], published in 2004, looked at a set of seven brands of recordable CDs and DVDs randomly selected from the commercial market. These discs were subjected to conditions of accelerated aging, consisting of either elevated temperature and humidity (various combinations of 60°C – 90°C and 70% RH – 90% RH or metal-halide (full-spectrum) light. They monitored the digital error parameters of BLER for CDs, and PIE (PI sum 8) for DVDs. After 500 hours of accelerated aging in elevated temperature and humidity, all brands of CDs had exceeded the BLER limit of 220. After 1000 hours of accelerated aging in fullspectrum light, all but two brands of CDs had exceeded the same BLER limit. For the three brands of recordable DVDs studied, two brands had exceeded the PIE limit of 280 after 250 hours in fullspectrum light; the same two brands exceeded this PIE limit after 125 hours at elevated temperature and humidity. Their basic conclusion was that: “Depending on the media type and intensity of the light, a disc may fail due to exposure to direct sunlight in as little as a few weeks. This will be especially true when coupled with the heating effect of exposure to sunlight or combined with any other heat source.” (Slattery et al., p. 523) In their 2004 paper, Shahani et al. [9] studied randomly selected CDs from a collection of over 60,000 CDs, to determine if the digital errors on these discs were increasing. This was determined by monitoring the Block Error Rate (BLER). They noted that the average BLER had increased from 70.5 (in 1996) to 72.4 (in 1999), and to 74.4 in 2003. While none of these values exceeded the maximum specification of 220, it was a concern that there was a steady upward trend in that number. Of particular interest in the Shahani et al. study was their characterization and pictures of the failure modes of the discs in their collection. These failure modes were corrosion of the metal layer, oxidation of the reflective layer, and delamination. The discs included in this study were mostly CD-ROMs, so dye fading was not a factor. But the other degradation modes have been shown to be definitive for optical discs in general. Another very significant study on recordable DVDs was released in 2009 by Svrcek of the Naval Air Warfare Center Weapons Division in China Lake, CA [10]. They tested 25 discs from each of six brands of DVDs, including Delkin, MAM-A, Mitsubishi, Verbatim (all archival-quality DVDs), Taiyo-Yuden (a top-rated standard-quality DVD) and Millenniata (advertised to be truly permanent). These 150 DVDs were subjected to accelerated aging conditions of 85°C, 85% RH, and 1120 W/m2 of fullspectrum light, all simultaneously. After only 48 hours of such testing, their results were: “All dye-based discs failed according to the ECMA PI8 max limit of 280. The post-test error statistics show all Millenniata discs pass the ECMA standard. The data recorded on these disks was recoverable. The Millenniata disks were the only ones tested that maintained information integrity.” (p. 45) Figure 8 shows their summary graph of the PI8 max values, clearly showing the degradation of the data on five of the brands of DVDs tested. Based on the preceding research, it is apparent that with one notable exception, recordable CDs and DVDs are currently not in a position to serve as a permanent storage solution for digital data. The one notable exception is Millenniata discs tested in the Svrcek report. Life Expectancy Summary The answer to the question that constitutes the title of this paper is found in Table 1. These figures are derived from Navale [1], Van Bogart [2], Pinheiro [5], Slattery [8], Shahani [9], Byers [11], Iraci [12], and Tanaka [13]. The LE values are given as 32 ©2011 Society for Imaging Science and Technology ranges because there are many values reported in these reports. These values are also approximate, for the same reason. Table 1: Life expectancy for data stored on today's media. Media Life Expectancy of Data Magnetic tape 10-50 years Magnetic hard-disk drives 1-7 years Flash drives and Solid-state
منابع مشابه
Priming Effect of on the Enhancement of Germination Traits in Aged Seeds of Chamomile (Matricaria chamomilla L.) Seeds Preserved in Medium and Long-term Storage
Chamomile (Matricaria chamomilla L.) is a widely used medicinal plant possessing several pharmacological effects due to presence of active compounds. In order to study of seed priming effects on seedling growth of chamomile, an experimental design, based on randomized complete design with three replications was conducted under greenhouse conditions in Research Institute of Forests and Rangeland...
متن کاملEffects of long-term frozen storage on the compositions of free amino acids and nucleotide-related compounds of the coconut crab Birgus latro
This study examined the effects of long-term frozen storage (-20 °C for 5 months) of free amino acids (FAAs) and nucleotide-related compounds (NRCs) in muscle and hepatopancreas of the coconut crab Birgus latro. Although long-term frozen storage had little effect on FAA composition in muscle, the amounts of several FAAs increased in the hepatopancreas that may be the result of protein decomposi...
متن کاملEffects of long-term frozen storage on the compositions of free amino acids and nucleotide-related compounds of the coconut crab Birgus latro
This study examined the effects of long-term frozen storage (-20 °C for 5 months) of free amino acids (FAAs) and nucleotide-related compounds (NRCs) in muscle and hepatopancreas of the coconut crab Birgus latro. Although long-term frozen storage had little effect on FAA composition in muscle, the amounts of several FAAs increased in the hepatopancreas that may be the result of protein decom...
متن کاملAging, Pensions and Long-term Care: What, Why, Who, How?; Comment on “Financing Long-term Care: Lessons From Japan”
Japan has been aging faster than other industrialized nations, and its experience offers useful lessons to others. Japan has been willing to expand its welfare state with a long-term care (LTC) insurance to finance home care and nursing home care for frail elderly. As Ikegami shows, it created new facilities and expanded specialized staffing for home care, developed a c...
متن کاملPlatelet Factor 3 Based-clotting Time Assay as a Quality Marker for Long-term Storage of Platelet Concentrates
Background: Platelets rapidly lose their qualities usually after 5 day of storage. Different standard methods have been recommended to check the quality of platelets during storage which some of them show better correlation with other quality markers during storage. The purpose of this study was to demonstrate if platelet factor 3 (PF3) assay could be an indicator of storage lesion and provide ...
متن کاملFlow Cytometric Measurement of CD41/CD61 and CD42b Platelet Receptors and Clotting Assay of Platelet Factor 3 During Long Term-Storage of Platelet Concentrates
Background: The purpose of the present in vitro study was to evaluate the effect of long term storage of conventional platelet concentrates (PCs) on major platelet receptors CD42b and CD41/CD61 by flow cytometry method and also measuring the overall platelet procoagulant activity status using platelet factor 3 (PF3) assay. Materials and Methods: Six random units of conventional platelet conce...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011